62 research outputs found

    Improving the Latency and Throughput of ZooKeeper Atomic Broadcast

    Get PDF
    ZooKeeper is a crash-tolerant system that offers fundamental services to Internet-scale applications, thereby reducing the development and hosting of the latter. It consists of >3 servers that form a replicated state machine. Maintaining these replicas in a mutually consistent state requires executing an Atomic Broadcast Protocol, Zab, so that concurrent requests for state changes are serialised identically at all replicas before being acted upon. Thus, ZooKeeper performance for update operations is determined by Zab performance. We contribute by presenting two easy-to-implement Zab variants, called ZabAC and ZabAA. They are designed to offer small atomic-broadcast latencies and to reduce the processing load on the primary node that plays a leading role in Zab. The former improves ZooKeeper performance and the latter enables ZooKeeper to face more challenging load conditions

    Design and development of algorithms for fault tolerant distributed systems

    Get PDF
    PhD ThesisThis thesis describes the design and development of algorithms for fault tolerant distributed systems. The development of such algorithms requires making assumptions about the types of component faults for which toler- ance is to be provided. Such assumptions must be specified accurately. To this end, this thesis develops a classification of faults in systems. This fault classification identifies a range of fault types from the most restricted to the least restricted. For each fault type, an algorithm for reaching distributed agreement in the presence of a bounded number of faulty processors is developed, and thus a family of agreement algorithms is presented. The influence of the various fault types on the complexities of these algorithms is discussed. Early stopping algorithms are also developed for selected fault types and the influence of fault types on the early stopping conditions of the respective algorithms is analysed. The problem of evaluating the perfor- mance of distributed replicated systems which will require agreement algo- rithms is considered next. As a first step in the direction of meeting this challenging task, a pipeline triple modular redundant system is considered and analytical methods are derived to evaluate the performance of such a system. Finally, the accuracy of these methods is examined using computer simulations.UK Science and Engineering Research Council (SERC), DELTA-4 consortium of ESPIRI

    DIGITALIZATION OF ENTERPRISE WITH ENSURING STABILITY AND RELIABILITY

    Get PDF
    The article is devoted to the development of an information system for automating business processes of a modern enterprise with ensuring stability and reliability, which are implemented by the applications developed by the authors. Goal is to develop improvements to the core digitalization processes of enterprises for sustainable functioning. The authors carried out a deep analysis and described the main stages of the enterprise digitalization process: the process of document approval, business processes of personnel management, etc. The architecture of the information system, a description of business processes and the principles of reliability and fault tolerance of the system being developed have been developed. The developed desktop-client application provides connection to the information system with the help of working computers of the enterprise through a local network with access to the application server. This allows you to reduce damage from accidental or deliberate incorrect actions of users and administrators; separation of protection; a variety of means of protection; simplicity and manageability of the information system and its security system

    Know your customer:balancing innovation and regulation for financial inclusion

    Get PDF
    Financial inclusion depends on providing adjusted services for citizens with disclosed vulnerabilities. At the same time, the financial industry needs to adhere to a strict regulatory framework, which is often in conflict with the desire for inclusive, adaptive, and privacy-preserving services. In this article we study how this tension impacts the deployment of privacy-sensitive technologies aimed at financial inclusion. We conduct a qualitative study with banking experts to understand their perspectives on service development for financial inclusion. We build and demonstrate a prototype solution based on open source decentralized identifiers and verifiable credentials software and report on feedback from the banking experts on this system. The technology is promising thanks to its selective disclosure of vulnerabilities to the full control of the individual. This supports GDPR requirements, but at the same time, there is a clear tension between introducing these technologies and fulfilling other regulatory requirements, particularly with respect to 'Know Your Customer.' We consider the policy implications stemming from these tensions and provide guidelines for the further design of related technologies.Comment: Published in the Journal Data & Polic

    A Middleware Architecture for Intrusion Tolerant Service Replication

    No full text
    This paper presents a novel combination of known techniques for building a middleware which can support service replication in a hostile environment where a node can get corrupted and fail arbitrarily and message transfer delays cannot be accurately bounded. Using localised replication and output comparison, failarbitrary behaviour is reduced to fail-signal: the middleware process of a corrupted server site fails only by emitting a fail-signal, and eventually fails permanently. With this failure-mode, it is possible to avoid the FLP impossibility result which applies only for crash failures; specifically, the termination of a deterministic asynchronous order protocol can be guaranteed even if network delays fluctuate arbitrarily (due to network intrusions) for an indefinite period. We show how reduction to fail-signal is achieved and present a deterministic, message-ordering protocol. We then argue that several, well-known crash-tolerant order protocols can be re-used with little re-design within the proposed middleware

    3 Enhancing Replica Management Services to Cope with Gro Failures

    No full text
    In a distributed system, replication of components, such as objects, is a well known way of achieving availability. For increased availability, crashed and disconnected components must be replaced by new components on available spare nodes. This replacement results in the membership of the replicated group 'walking ' over a number of machines during system operation. In this context, we address the problem of reconfiguring a group after the group as an entity has failed. Such a failure is termed a group failure which, for example, can be the crash of every component in the group or the group being partitioned into minority islands. The solution assumes crash-proof storage, and eventual recovery of crashed nodes and healing of partitions. It guarantees that (i) the number of groups reconfigured after a group failure is never more than one, and (ii) the reconfigured group contains a majority of the components which were members of the group just before the group failure occurred, so that the loss of state information due to a group failure is minimal. Though the protocol is subject to blocking, it remains efficient in terms of communication rounds and use of stable store, during both normal operations and reconfiguration after a group failure
    • …
    corecore